Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Nucleic Acids Res ; 51(D1): D1405-D1416, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36624666

RESUMEN

The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.


Asunto(s)
Bases de Datos Factuales , Terapia Molecular Dirigida , Proteoma , Humanos , Productos Biológicos , Descubrimiento de Drogas , Internet , Proteoma/efectos de los fármacos
2.
Front Artif Intell ; 5: 932665, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36034595

RESUMEN

Rare diseases (RDs) are naturally associated with a low prevalence rate, which raises a big challenge due to there being less data available for supporting preclinical and clinical studies. There has been a vast improvement in our understanding of RD, largely owing to advanced big data analytic approaches in genetics/genomics. Consequently, a large volume of RD-related publications has been accumulated in recent years, which offers opportunities to utilize these publications for accessing the full spectrum of the scientific research and supporting further investigation in RD. In this study, we systematically analyzed, semantically annotated, and scientifically categorized RD-related PubMed articles, and integrated those semantic annotations in a knowledge graph (KG), which is hosted in Neo4j based on a predefined data model. With the successful demonstration of scientific contribution in RD via the case studies performed by exploring this KG, we propose to extend the current effort by expanding more RD-related publications and more other types of resources as a next step.

3.
Bioorg Med Chem ; 56: 116588, 2022 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-35030421

RESUMEN

Membrane permeability plays an important role in oral drug absorption. Caco-2 and Madin-Darby Canine Kidney (MDCK) cell culture systems have been widely used for assessing intestinal permeability. Since most drugs are absorbed passively, Parallel Artificial Membrane Permeability Assay (PAMPA) has gained popularity as a low-cost and high-throughput method in early drug discovery when compared to high-cost, labor intensive cell-based assays. At the National Center for Advancing Translational Sciences (NCATS), PAMPA pH 5 is employed as one of the Tier I absorption, distribution, metabolism, and elimination (ADME) assays. In this study, we have developed a quantitative structure activity relationship (QSAR) model using our ∼6500 compound PAMPA pH 5 permeability dataset. Along with ensemble decision tree-based methods such as Random Forest and eXtreme Gradient Boosting, we employed deep neural network and a graph convolutional neural network to model PAMPA pH 5 permeability. The classification models trained on a balanced training set provided accuracies ranging from 71% to 78% on the external set. Of the four classifiers, the graph convolutional neural network that directly operates on molecular graphs offered the best classification performance. Additionally, an ∼85% correlation was obtained between PAMPA pH 5 permeability and in vivo oral bioavailability in mice and rats. These results suggest that data from this assay (experimental or predicted) can be used to rank-order compounds for preclinical in vivo testing with a high degree of confidence, reducing cost and attrition as well as accelerating the drug discovery process. Additionally, experimental data for 486 compounds (PubChem AID: 1645871) and the best models have been made publicly available (https://opendata.ncats.nih.gov/adme/).


Asunto(s)
Betametasona/farmacocinética , Dexametasona/farmacocinética , Ranitidina/farmacocinética , Verapamilo/farmacocinética , Administración Oral , Animales , Betametasona/administración & dosificación , Disponibilidad Biológica , Células CACO-2 , Permeabilidad de la Membrana Celular/efectos de los fármacos , Células Cultivadas , Dexametasona/administración & dosificación , Perros , Relación Dosis-Respuesta a Droga , Humanos , Concentración de Iones de Hidrógeno , Células de Riñón Canino Madin Darby , Ratones , Estructura Molecular , Redes Neurales de la Computación , Ranitidina/administración & dosificación , Ratas , Relación Estructura-Actividad , Verapamilo/administración & dosificación
4.
Nucleic Acids Res ; 50(D1): D1307-D1316, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34648031

RESUMEN

The United States has a complex regulatory scheme for marketing drugs. Understanding drug regulatory status is a daunting task that requires integrating data from many sources from the United States Food and Drug Administration (FDA), US government publications, and other processes related to drug development. At NCATS, we created Inxight Drugs (https://drugs.ncats.io), a web resource that attempts to address this challenge in a systematic manner. NCATS Inxight Drugs incorporates and unifies a wealth of data, including those supplied by the FDA and from independent public sources. The database offers a substantial amount of manually curated literature data unavailable from other sources. Currently, the database contains 125 036 product ingredients, including 2566 US approved drugs, 6242 marketed drugs, and 9684 investigational drugs. All substances are rigorously defined according to the ISO 11238 standard to comply with existing regulatory standards for unique drug substance identification. A special emphasis was placed on capturing manually curated and referenced data on treatment modalities and semantic relationships between substances. A supplementary resource 'Novel FDA Drug Approvals' features regulatory details of newly approved FDA drugs. The database is regularly updated using NCATS Stitcher data integration tool that automates data aggregation and supports full data access through a RESTful API.


Asunto(s)
Bases de Datos Factuales , Bases de Datos Farmacéuticas , Preparaciones Farmacéuticas/clasificación , United States Food and Drug Administration , Humanos , National Center for Advancing Translational Sciences (U.S.) , Investigación Biomédica Traslacional/clasificación , Estados Unidos
5.
Orphanet J Rare Dis ; 16(1): 483, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34794473

RESUMEN

BACKGROUND: Limited knowledge and unclear underlying biology of many rare diseases pose significant challenges to patients, clinicians, and scientists. To address these challenges, there is an urgent need to inspire and encourage scientists to propose and pursue innovative research studies that aim to uncover the genetic and molecular causes of more rare diseases and ultimately to identify effective therapeutic solutions. A clear understanding of current research efforts, knowledge/research gaps, and funding patterns as scientific evidence is crucial to systematically accelerate the pace of research discovery in rare diseases, which is an overarching goal of this study. METHODS: To semantically represent NIH funding data for rare diseases and advance its use of effectively promoting rare disease research, we identified NIH funded projects for rare diseases by mapping GARD diseases to the project based on project titles; subsequently we presented and managed those identified projects in a knowledge graph using Neo4j software, hosted at NCATS, based on a pre-defined data model that captures semantics among the data. With this developed knowledge graph, we were able to perform several case studies to demonstrate scientific evidence generation for supporting rare disease research discovery. RESULTS: Of 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting amongst mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research. CONCLUSION: We developed an integrative knowledge graph with rare disease funding data and demonstrated its use as a source from where we can effectively identify and generate scientific evidence to support rare disease research. With the success of this preliminary study, we plan to implement advanced computational approaches for analyzing more funding related data, e.g., project abstracts and PubMed article abstracts, and linking to other types of biomedical data to perform more sophisticated research gap analysis and identify opportunities for future research in rare diseases.


Asunto(s)
Investigación Biomédica , Enfermedades Raras , Humanos , Reconocimiento de Normas Patrones Automatizadas
6.
J Infect Dis ; 224(12 Suppl 2): S204-S208, 2021 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-34469558

RESUMEN

The quantitative polymerase chain reaction (qPCR) method presented in this study allows the identification of pneumococcal capsular serotypes in cerebrospinal fluid without first performing DNA extraction. This testing approach, which saves time and resources, demonstrated similar sensitivity and a high level of agreement between cycle threshold values when it was compared side-by-side with the standard qPCR method with extracted DNA.


Asunto(s)
Reacción en Cadena de la Polimerasa Multiplex/métodos , Infecciones Neumocócicas , Streptococcus pneumoniae/genética , Humanos , Infecciones Neumocócicas/diagnóstico , Serogrupo , Serotipificación , Streptococcus pneumoniae/aislamiento & purificación
7.
Drug Metab Dispos ; 49(9): 822-832, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34183376

RESUMEN

Cytochrome P450 enzymes are responsible for the metabolism of >75% of marketed drugs, making it essential to identify the contributions of individual cytochromes P450 to the total clearance of a new candidate drug. Overreliance on one cytochrome P450 for clearance levies a high risk of drug-drug interactions; and considering that several human cytochrome P450 enzymes are polymorphic, it can also lead to highly variable pharmacokinetics in the clinic. Thus, it would be advantageous to understand the likelihood of new chemical entities to interact with the major cytochrome P450 enzymes at an early stage in the drug discovery process. Typical screening assays using human liver microsomes do not provide sufficient information to distinguish the specific cytochromes P450 responsible for clearance. In this regard, we experimentally assessed the metabolic stability of ∼5000 compounds for the three most prominent xenobiotic metabolizing human cytochromes P450, i.e., CYP2C9, CYP2D6, and CYP3A4, and used the data sets to develop quantitative structure-activity relationship models for the prediction of high-clearance substrates for these enzymes. Screening library included the NCATS Pharmaceutical Collection, comprising clinically approved low-molecular-weight compounds, and an annotated library consisting of drug-like compounds. To identify inhibitors, the library was screened against a luminescence-based cytochrome P450 inhibition assay; and through crossreferencing hits from the two assays, we were able to distinguish substrates and inhibitors of these enzymes. The best substrate and inhibitor models (balanced accuracies ∼0.7), as well as the data used to develop these models, have been made publicly available (https://opendata.ncats.nih.gov/adme) to advance drug discovery across all research groups. SIGNIFICANCE STATEMENT: In drug discovery and development, drug candidates with indiscriminate cytochrome P450 metabolic profiles are considered advantageous, since they provide less risk of potential issues with cytochrome P450 polymorphisms and drug-drug interactions. This study developed robust substrate and inhibitor quantitative structure-activity relationship models for the three major xenobiotic metabolizing cytochromes P450, i.e., CYP2C9, CYP2D6, and CYP3A4. The use of these models early in drug discovery will enable project teams to strategize or pivot when necessary, thereby accelerating drug discovery research.


Asunto(s)
Citocromo P-450 CYP2C9/metabolismo , Citocromo P-450 CYP2D6/metabolismo , Citocromo P-450 CYP3A/metabolismo , Desarrollo de Medicamentos/métodos , Inhibidores Enzimáticos , Biocatálisis , Descubrimiento de Drogas/métodos , Interacciones Farmacológicas , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacocinética , Humanos , Inactivación Metabólica , Tasa de Depuración Metabólica , Relación Estructura-Actividad Cuantitativa
8.
SLAS Discov ; 26(10): 1326-1336, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34176369

RESUMEN

Problems with drug ADME are responsible for many clinical failures. By understanding the ADME properties of marketed drugs and modeling how chemical structure contributes to these inherent properties, we can help new projects reduce their risk profiles. Kinetic aqueous solubility, the parallel artificial membrane permeability assay (PAMPA), and rat liver microsomal stability constitute the Tier I ADME assays at the National Center for Advancing Translational Sciences (NCATS). Using recent data generated from in-house lead optimization Tier I studies, we update quantitative structure-activity relationship (QSAR) models for these three endpoints and validate in silico performance against a set of marketed drugs (balanced accuracies range between 71% and 85%). Improved models and experimental datasets are of direct relevance to drug discovery projects and, together with the prediction services that have been made publicly available at the ADME@NCATS web portal (https://opendata.ncats.nih.gov/adme/), provide important tools for the drug discovery community. The results are discussed in light of our previously reported ADME models and state-of-the-art models from scientific literature.Graphical Abstract[Figure: see text].


Asunto(s)
Preparaciones Farmacéuticas/química , Animales , Descubrimiento de Drogas/métodos , Modelos Biológicos , National Center for Advancing Translational Sciences (U.S.) , Relación Estructura-Actividad Cuantitativa , Ratas , Ciencia Traslacional Biomédica/métodos , Estados Unidos
9.
Nucleic Acids Res ; 49(D1): D1160-D1169, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33151287

RESUMEN

DrugCentral is a public resource (http://drugcentral.org) that serves the scientific community by providing up-to-date drug information, as described in previous papers. The current release includes 109 newly approved (October 2018 through March 2020) active pharmaceutical ingredients in the US, Europe, Japan and other countries; and two molecular entities (e.g. mefuparib) of interest for COVID19. New additions include a set of pharmacokinetic properties for ∼1000 drugs, and a sex-based separation of side effects, processed from FAERS (FDA Adverse Event Reporting System); as well as a drug repositioning prioritization scheme based on the market availability and intellectual property rights forFDA approved drugs. In the context of the COVID19 pandemic, we also incorporated REDIAL-2020, a machine learning platform that estimates anti-SARS-CoV-2 activities, as well as the 'drugs in news' feature offers a brief enumeration of the most interesting drugs at the present moment. The full database dump and data files are available for download from the DrugCentral web portal.


Asunto(s)
Antivirales/uso terapéutico , Tratamiento Farmacológico de COVID-19 , Bases de Datos Farmacéuticas/estadística & datos numéricos , Aprobación de Drogas/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Reposicionamiento de Medicamentos/estadística & datos numéricos , SARS-CoV-2/efectos de los fármacos , Antivirales/efectos adversos , Antivirales/farmacocinética , COVID-19/epidemiología , COVID-19/virología , Aprobación de Drogas/métodos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Epidemias , Europa (Continente) , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Japón , SARS-CoV-2/fisiología , Estados Unidos
10.
Nucleic Acids Res ; 49(D1): D1179-D1185, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33137173

RESUMEN

The US Food and Drug Administration (FDA) and the National Center for Advancing Translational Sciences (NCATS) have collaborated to publish rigorous scientific descriptions of substances relevant to regulated products. The FDA has adopted the global ISO 11238 data standard for the identification of substances in medicinal products and has populated a database to organize the agency's regulatory submissions and marketed products data. NCATS has worked with FDA to develop the Global Substance Registration System (GSRS) and produce a non-proprietary version of the database for public benefit. In 2019, more than half of all new drugs in clinical development were proteins, nucleic acid therapeutics, polymer products, structurally diverse natural products or cellular therapies. While multiple databases of small molecule chemical structures are available, this resource is unique in its application of regulatory standards for the identification of medicinal substances and its robust support for other substances in addition to small molecules. This public, manually curated dataset provides unique ingredient identifiers (UNIIs) and detailed descriptions for over 100 000 substances that are particularly relevant to medicine and translational research. The dataset can be accessed and queried at https://gsrs.ncats.nih.gov/app/substances.


Asunto(s)
Bases de Datos de Compuestos Químicos , Bases de Datos Factuales , Bases de Datos Farmacéuticas , Salud Pública/legislación & jurisprudencia , Productos Biológicos/química , Productos Biológicos/clasificación , Conjuntos de Datos como Asunto , Drogas en Investigación/química , Drogas en Investigación/clasificación , Humanos , Internet , Ácidos Nucleicos/química , Ácidos Nucleicos/clasificación , Polímeros/química , Polímeros/clasificación , Medicamentos bajo Prescripción/química , Medicamentos bajo Prescripción/clasificación , Proteínas/química , Proteínas/clasificación , Salud Pública/métodos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/clasificación , Programas Informáticos , Estados Unidos , United States Food and Drug Administration , Xenobióticos/química , Xenobióticos/clasificación
11.
Nucleic Acids Res ; 49(D1): D1334-D1346, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33156327

RESUMEN

In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.


Asunto(s)
Bases de Datos Factuales , Genoma Humano , Enfermedades Neurodegenerativas/genética , Proteómica/métodos , Programas Informáticos , Virosis/genética , Animales , Anticonvulsivantes/química , Anticonvulsivantes/uso terapéutico , Antivirales/química , Antivirales/uso terapéutico , Productos Biológicos/química , Productos Biológicos/uso terapéutico , Minería de Datos/estadística & datos numéricos , Interacciones Huésped-Patógeno/efectos de los fármacos , Interacciones Huésped-Patógeno/genética , Humanos , Internet , Aprendizaje Automático/estadística & datos numéricos , Ratones , Ratones Noqueados , Terapia Molecular Dirigida/métodos , Enfermedades Neurodegenerativas/clasificación , Enfermedades Neurodegenerativas/tratamiento farmacológico , Enfermedades Neurodegenerativas/virología , Mapeo de Interacción de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inhibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/uso terapéutico , Virosis/clasificación , Virosis/tratamiento farmacológico , Virosis/virología
12.
J Chem Inf Model ; 60(12): 6007-6019, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-33259212

RESUMEN

The rise of novel artificial intelligence (AI) methods necessitates their benchmarking against classical machine learning for a typical drug-discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by the human ether-à-go-go-related gene (hERG), leads to a prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for the assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here, we perform a comprehensive comparison of hERG effect prediction models based on classical approaches (random forests and gradient boosting) and modern AI methods [deep neural networks (DNNs) and recurrent neural networks (RNNs)]. The training set (∼9000 compounds) was compiled by integrating the hERG bioactivity data from the ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-value continuous vectors derived from chemical autoencoders trained on a large chemical space (>1.5 million compounds). The models were prospectively validated on ∼840 in-house compounds screened in the same thallium flux assay. The best results were obtained with the XGBoost method and RDKit descriptors. The comparison of models based only on latent descriptors revealed that the DNNs performed significantly better than the classical methods. The RNNs that operate on SMILES provided the highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Furthermore, we shed light on the potential of AI methods to exploit the big data in chemistry and generate novel chemical representations useful in predictive modeling and tailoring a new chemical space.


Asunto(s)
Canales de Potasio Éter-A-Go-Go , Bloqueadores de los Canales de Potasio , Inteligencia Artificial , Macrodatos , Descubrimiento de Drogas , Humanos , Bloqueadores de los Canales de Potasio/farmacología
13.
Sci Rep ; 10(1): 20713, 2020 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-33244000

RESUMEN

Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. Between 2012 and 2020, RLM stability data was generated for ~ 24,000 compounds from more than 250 projects that cover a wide range of pharmacological targets and cellular pathways. Although a crucial endpoint, little or no data exists in the public domain. In this study, computational models were developed for predicting RLM stability using different machine learning methods. In addition, a retrospective time-split validation was performed, and local models were built for projects that performed poorly with global models. Further analysis revealed inherent medicinal chemistry knowledge potentially useful to chemists in the pursuit of synthesizing metabolically stable compounds. In addition, we deposited experimental data for ~ 2500 compounds in the PubChem bioassay database (AID: 1508591). The global prediction models are made publicly accessible ( https://opendata.ncats.nih.gov/adme ). This is to the best of our knowledge, the first publicly available RLM prediction model built using high-quality data generated at a single laboratory.


Asunto(s)
Microsomas Hepáticos/metabolismo , Preparaciones Farmacéuticas/metabolismo , Animales , Simulación por Computador , Bases de Datos Factuales , Descubrimiento de Drogas/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Hígado/metabolismo , Aprendizaje Automático , Masculino , National Center for Advancing Translational Sciences (U.S.) , Relación Estructura-Actividad Cuantitativa , Ratas , Ratas Sprague-Dawley , Estudios Retrospectivos , Estados Unidos
14.
Methods Inf Med ; 59(4-05): 131-139, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-33147635

RESUMEN

OBJECTIVE: In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. METHODS: We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. RESULTS: We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). CONCLUSION: The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.


Asunto(s)
Enfermedades Raras , Unified Medical Language System , Humanos , Bases del Conocimiento , Enfermedades Raras/epidemiología , Enfermedades Raras/genética , Semántica
15.
J Biomed Semantics ; 11(1): 13, 2020 11 12.
Artículo en Inglés | MEDLINE | ID: mdl-33183351

RESUMEN

BACKGROUND: The Genetic and Rare Diseases (GARD) Information Center was established by the National Institutes of Health (NIH) to provide freely accessible consumer health information on over 6500 genetic and rare diseases. As the cumulative scientific understanding and underlying evidence for these diseases have expanded over time, existing practices to generate knowledge from these publications and resources have not been able to keep pace. Through determining the applicability of computational approaches to enhance or replace manual curation tasks, we aim to both improve the sustainability and relevance of consumer health information, but also to develop a foundational database, from which translational science researchers may start to unravel disease characteristics that are vital to the research process. RESULTS: We developed a meta-ontology based integrative knowledge graph for rare diseases in Neo4j. This integrative knowledge graph includes a total of 3,819,623 nodes and 84,223,681 relations from 34 different biomedical data resources, including curated drug and rare disease associations. Semi-automatic mappings were generated for 2154 unique FDA orphan designations to 776 unique GARD diseases, and 3322 unique FDA designated drugs to UNII, as well as 180,363 associations between drug and indication from Inxight Drugs, which were integrated into the knowledge graph. We conducted four case studies to demonstrate the capabilities of this integrative knowledge graph in accelerating the curation of scientific understanding on rare diseases through the generation of disease mappings/profiles and pathogenesis associations. CONCLUSIONS: By integrating well-established database resources, we developed an integrative knowledge graph containing a large volume of biomedical and research data. Demonstration of several immediate use cases and limitations of this process reveal both the potential feasibility and barriers of utilizing graph-based resources and approaches to support their use by providers of consumer health information, such as GARD, that may struggle with the needs of maintaining knowledge reliant on an evolving and growing evidence-base. Finally, the successful integration of these datasets into a freely accessible knowledge graph highlights an opportunity to take a translational science view on the field of rare diseases by enabling researchers to identify disease characteristics, which may play a role in the translation of discover across different research domains.


Asunto(s)
Ontologías Biológicas , Gráficos por Computador , Bases de Datos Factuales , Enfermedades Raras/genética , Humanos , Investigación Biomédica Traslacional
16.
JMIR Med Inform ; 8(10): e18395, 2020 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-33006565

RESUMEN

BACKGROUND: Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. OBJECTIVE: To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. METHODS: We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. RESULTS: For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. CONCLUSIONS: We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.

17.
Environ Health Perspect ; 128(2): 27002, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32074470

RESUMEN

BACKGROUND: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.


Asunto(s)
Simulación por Computador , Disruptores Endocrinos , Andrógenos , Bases de Datos Factuales , Ensayos Analíticos de Alto Rendimiento , Humanos , Receptores Androgénicos , Estados Unidos , United States Environmental Protection Agency
18.
Curr Protoc Bioinformatics ; 69(1): e92, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31898878

RESUMEN

Pharos is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied "dark" targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. Two basic protocols illustrate the levels of detail available for targets and several methods of finding targets of interest. An Alternate Protocol illustrates the difference in available knowledge between less and more studied targets. © 2020 by John Wiley & Sons, Inc. Basic Protocol 1: Search for a target and view details Alternate Protocol: Search for dark target and view details Basic Protocol 2: Filter a target list to get refined results.


Asunto(s)
Descubrimiento de Drogas , Genoma , Programas Informáticos , Neoplasias de la Mama/genética , Sistemas de Liberación de Medicamentos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Ligandos , Receptores Acoplados a Proteínas G/metabolismo
19.
J Cheminform ; 12(1): 21, 2020 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-33431020

RESUMEN

Over the last few decades, chemists have become skilled at designing compounds that avoid cytochrome P (CYP) 450 mediated metabolism. Typical screening assays are performed in liver microsomal fractions and it is possible to overlook the contribution of cytosolic enzymes until much later in the drug discovery process. Few data exist on cytosolic enzyme-mediated metabolism and no reliable tools are available to chemists to help design away from such liabilities. In this study, we screened 1450 compounds for liver cytosol-mediated metabolic stability and extracted transformation rules that might help medicinal chemists in optimizing compounds with these liabilities. In vitro half-life data were collected by performing in-house experiments in mouse (CD-1 male) and human (mixed gender) cytosol fractions. Matched molecular pairs analysis was performed in conjunction with qualitative-structure activity relationship modeling to identify chemical structure transformations affecting cytosolic stability. The transformation rules were prospectively validated on the test set. In addition, selected rules were validated on a diverse chemical library and the resulting pairs were experimentally tested to confirm whether the identified transformations could be generalized. The validation results, comprising nearly 250 library compounds and corresponding half-life data, are made publicly available. The datasets were also used to generate in silico classification models, based on different molecular descriptors and machine learning methods, to predict cytosol-mediated liabilities. To the best of our knowledge, this is the first systematic in silico effort to address cytosolic enzyme-mediated liabilities.

20.
J Chem Inf Model ; 59(11): 4613-4624, 2019 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-31584270

RESUMEN

Advances in the development of high-throughput screening and automated chemistry have rapidly accelerated the production of chemical and biological data, much of them freely accessible through literature aggregator services such as ChEMBL and PubChem. Here, we explore how to use this comprehensive mapping of chemical biology space to support the development of large-scale quantitative structure-activity relationship (QSAR) models. We propose a new deep learning consensus architecture (DLCA) that combines consensus and multitask deep learning approaches together to generate large-scale QSAR models. This method improves knowledge transfer across different target/assays while also integrating contributions from models based on different descriptors. The proposed approach was validated and compared with proteochemometrics, multitask deep learning, and Random Forest methods paired with various descriptors types. DLCA models demonstrated improved prediction accuracy for both regression and classification tasks. The best models together with their modeling sets are provided through publicly available web services at https://predictor.ncats.io .


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Relación Estructura-Actividad Cuantitativa , Humanos , Modelos Biológicos , Sistemas en Línea , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...